Goto

Collaborating Authors

 speeding


Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimization

Neural Information Processing Systems

We study the estimation of the latent variable Gaussian graphical model (LVGGM), where the precision matrix is the superposition of a sparse matrix and a low-rank matrix. In order to speed up the estimation of the sparse plus low-rank components, we propose a sparsity constrained maximum likelihood estimator based on matrix factorization and an efficient alternating gradient descent algorithm with hard thresholding to solve it. Our algorithm is orders of magnitude faster than the convex relaxation based methods for LVGGM. In addition, we prove that our algorithm is guaranteed to linearly converge to the unknown sparse and low-rank components up to the optimal statistical precision. Experiments on both synthetic and genomic data demonstrate the superiority of our algorithm over the state-of-the-art algorithms and corroborate our theory.


Waymo's Robotaxis Can Now Use the Highway, Speeding Up Longer Trips

WIRED

Waymo's Robotaxis Can Now Use the Highway, Speeding Up Longer Trips The Alphabet company's self-driving cars are opening up shop in more and more cities. When Google's self-driving car project began testing in the Bay Area back in 2009, its engineers focused on highways by sending its sensor-laden vehicles cruising down Interstate 280, which runs the length of Silicon Valley's peninsula. More than 15 years later, the cars are back on the freeway--this time without drivers. On Tuesday, the project, now an Alphabet subsidiary we all know as Waymo, announced that its robotaxi service would now drive on freeways in the San Francisco Bay Area, Los Angeles, and Phoenix. The new service marks another technical leap for Waymo, whose robotaxis currently serve five US metros: Atlanta, Austin, Los Angeles, Phoenix, and the San Francisco Bay Area.


QWO: Speeding Up Permutation-Based Causal Discovery in LiGAMs

Neural Information Processing Systems

Causal discovery is essential for understanding relationships among variables of interest in many scientific domains. In this paper, we focus on permutation-based methods for learning causal graphs in Linear Gaussian Acyclic Models (LiGAMs), where the permutation encodes a causal ordering of the variables. Existing methods in this setting are not scalable due to their high computational complexity. These methods are comprised of two main components: (i) constructing a specific DAG, \mathcal{G} \pi, for a given permutation \pi, which represents the best structure that can be learned from the available data while adhering to \pi, and (ii) searching over the space of permutations (i.e., causal orders) to minimize the number of edges in \mathcal{G} \pi . We introduce QWO, a novel approach that significantly enhances the efficiency of computing \mathcal{G} \pi for a given permutation \pi .


SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference

arXiv.org Artificial Intelligence

We present SuffixDecoding, a novel model-free approach to accelerating large language model (LLM) inference through speculative decoding. Unlike existing methods that rely on draft models or specialized decoding heads, SuffixDecoding leverages suffix trees built from previously generated outputs to efficiently predict candidate token sequences. Our approach enables flexible tree-structured speculation without the overhead of maintaining and orchestrating additional models. SuffixDecoding builds and dynamically updates suffix trees to capture patterns in the generated text, using them to construct speculation trees through a principled scoring mechanism based on empirical token frequencies. SuffixDecoding requires only CPU memory which is plentiful and underutilized on typical LLM serving nodes. We demonstrate that SuffixDecoding achieves competitive speedups compared to model-based approaches across diverse workloads including open-domain chat, code generation, and text-to-SQL tasks. For open-ended chat and code generation tasks, SuffixDecoding achieves up to $1.4\times$ higher output throughput than SpecInfer and up to $1.1\times$ lower time-per-token (TPOT) latency. For a proprietary multi-LLM text-to-SQL application, SuffixDecoding achieves up to $2.9\times$ higher output throughput and $3\times$ lower latency than speculative decoding. Our evaluation shows that SuffixDecoding maintains high acceptance rates even with small reference corpora of 256 examples, while continuing to improve performance as more historical outputs are incorporated.


Reviews: Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimization

Neural Information Processing Systems

The paper considers learning the dependency structure of Gaussian graphical models where some variables are latent. Directly applying the usual assumption of sparsity in the precision matrix is difficult because variables that appear correlated might actually both depend on a common latent variable. Previously, Chandrasekaran et al. proposed estimating the model structure by decomposing the full precision matrix into the sum of of a sparse matrix and a low-rank matrix. Likelihood is maximized while the components of the sparse matrix are penalized with an l1 regularizer and the low-rank matrix is penalized with a nuclear norm. Computing the proximal operator to update the low-rank component requires performing SVD in O(d 3) time at each iteration. The authors propose replacing the low-rank component with its Cholesky decomposition ZZ T and finding Z directly.


AccEPT: An Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

arXiv.org Artificial Intelligence

It is usually infeasible to fit and train an entire large deep neural network (DNN) model using a single edge device due to the limited resources. To facilitate intelligent applications across edge devices, researchers have proposed partitioning a large model into several sub-models, and deploying each of them to a different edge device to collaboratively train a DNN model. However, the communication overhead caused by the large amount of data transmitted from one device to another during training, as well as the sub-optimal partition point due to the inaccurate latency prediction of computation at each edge device can significantly slow down training. In this paper, we propose AccEPT, an acceleration scheme for accelerating the edge collaborative pipeline-parallel training. In particular, we propose a light-weight adaptive latency predictor to accurately estimate the computation latency of each layer at different devices, which also adapts to unseen devices through continuous learning. Therefore, the proposed latency predictor leads to better model partitioning which balances the computation loads across participating devices. Moreover, we propose a bit-level computation-efficient data compression scheme to compress the data to be transmitted between devices during training. Our numerical results demonstrate that our proposed acceleration approach is able to significantly speed up edge pipeline parallel training up to 3 times faster in the considered experimental settings.


Speeding Up Speech Synthesis In Diffusion Models By Reducing Data Distribution Recovery Steps Via Content Transfer

arXiv.org Artificial Intelligence

Diffusion based vocoders have been criticised for being slow due to the many steps required during sampling. Moreover, the model's loss function that is popularly implemented is designed such that the target is the original input $x_0$ or error $\epsilon_0$. For early time steps of the reverse process, this results in large prediction errors, which can lead to speech distortions and increase the learning time. We propose a setup where the targets are the different outputs of forward process time steps with a goal to reduce the magnitude of prediction errors and reduce the training time. We use the different layers of a neural network (NN) to perform denoising by training them to learn to generate representations similar to the noised outputs in the forward process of the diffusion. The NN layers learn to progressively denoise the input in the reverse process until finally the final layer estimates the clean speech. To avoid 1:1 mapping between layers of the neural network and the forward process steps, we define a skip parameter $\tau>1$ such that an NN layer is trained to cumulatively remove the noise injected in the $\tau$ steps in the forward process. This significantly reduces the number of data distribution recovery steps and, consequently, the time to generate speech. We show through extensive evaluation that the proposed technique generates high-fidelity speech in competitive time that outperforms current state-of-the-art tools. The proposed technique is also able to generalize well to unseen speech.


Speeding up the Parti-Game Algorithm

Neural Information Processing Systems

In this paper, we introduce an efficient replanning algorithm for nonde- terministic domains, namely what we believe to be the first incremental heuristic minimax search algorithm. We apply it to the dynamic dis- cretization of continuous domains, resulting in an efficient implemen- tation of the parti-game reinforcement-learning algorithm for control in high-dimensional domains.


Speeding up the time to perform MRI scans with AI-assisted technology – JD Supra

#artificialintelligence

It appears that when machine learning is used to reconstruct MRI images, albeit at a faster pace and with less imaging data acquisition than …


Speeding Up Recommender Systems Using Association Rules

arXiv.org Artificial Intelligence

Recommender systems are considered one of the most rapidly growing branches of Artificial Intelligence. The demand for finding more efficient techniques to generate recommendations becomes urgent. However, many recommendations become useless if there is a delay in generating and showing them to the user. Therefore, we focus on improving the speed of recommendation systems without impacting the accuracy. In this paper, we suggest a novel recommender system based on Factorization Machines and Association Rules (FMAR). We introduce an approach to generate association rules using two algorithms: (i) apriori and (ii) frequent pattern (FP) growth. These association rules will be utilized to reduce the number of items passed to the factorization machines recommendation model. We show that FMAR has significantly decreased the number of new items that the recommender system has to predict and hence, decreased the required time for generating the recommendations. On the other hand, while building the FMAR tool, we concentrate on making a balance between prediction time and accuracy of generated recommendations to ensure that the accuracy is not significantly impacted compared to the accuracy of using factorization machines without association rules.