Goto

Collaborating Authors

 rescale


STAR: Spectral Truncation and Rescale for Model Merging

Lee, Yu-Ang, Ko, Ching-Yun, Pedapati, Tejaswini, Chung, I-Hsin, Yeh, Mi-Yen, Chen, Pin-Yu

arXiv.org Artificial Intelligence

Model merging is an efficient way of obtaining a multi-task model from several pretrained models without further fine-tuning, and it has gained attention in various domains, including natural language processing (NLP). Despite the efficiency, a key challenge in model merging is the seemingly inevitable decrease in task performance as the number of models increases. In this paper, we propose $\mathbf{S}$pectral $\mathbf{T}$runcation $\mathbf{A}$nd $\mathbf{R}$escale (STAR) that aims at mitigating ``merging conflicts'' by truncating small components in the respective spectral spaces, which is followed by an automatic parameter rescaling scheme to retain the nuclear norm of the original matrix. STAR requires no additional inference on original training data and is robust to hyperparamater choice. We demonstrate the effectiveness of STAR through extensive model merging cases on diverse NLP tasks. Specifically, STAR works robustly across varying model sizes, and can outperform baselines by 4.2$\%$ when merging 12 models on Flan-T5. Our code is publicly available at https://github.com/IBM/STAR.


Rethink Model Re-Basin and the Linear Mode Connectivity

Qu, Xingyu, Horvath, Samuel

arXiv.org Artificial Intelligence

Recent studies suggest that with sufficiently wide models, most SGD solutions can, up to permutation, converge into the same basin. This phenomenon, known as the model re-basin regime, has significant implications for model averaging. However, current re-basin strategies are limited in effectiveness due to a lack of comprehensive understanding of underlying mechanisms. Addressing this gap, our work revisits standard practices and uncovers the frequent inadequacies of existing matching algorithms, which we show can be mitigated through proper re-normalization. By introducing a more direct analytical approach, we expose the interaction between matching algorithms and re-normalization processes. This perspective not only clarifies and refines previous findings but also facilitates novel insights. For instance, it connects the linear mode connectivity to pruning, motivating a lightweight yet effective post-pruning plug-in that can be directly merged with any existing pruning techniques. Our implementation is available at https://github.com/XingyuQu/rethink-re-basin.


A decomposition of book structure through ousiometric fluctuations in cumulative word-time

Fudolig, Mikaela Irene, Alshaabi, Thayer, Cramer, Kathryn, Danforth, Christopher M., Dodds, Peter Sheridan

arXiv.org Artificial Intelligence

While quantitative methods have been used to examine changes in word usage in books, studies have focused on overall trends, such as the shapes of narratives, which are independent of book length. We instead look at how words change over the course of a book as a function of the number of words, rather than the fraction of the book, completed at any given point; we define this measure as "cumulative word-time". Using ousiometrics, a reinterpretation of the valence-arousal-dominance framework of meaning obtained from semantic differentials, we convert text into time series of power and danger scores in cumulative word-time. Each time series is then decomposed using empirical mode decomposition into a sum of constituent oscillatory modes and a non-oscillatory trend. By comparing the decomposition of the original power and danger time series with those derived from shuffled text, we find that shorter books exhibit only a general trend, while longer books have fluctuations in addition to the general trend. These fluctuations typically have a period of a few thousand words regardless of the book length or library classification code, but vary depending on the content and structure of the book. Our findings suggest that, in the ousiometric sense, longer books are not expanded versions of shorter books, but are more similar in structure to a concatenation of shorter texts. Further, they are consistent with editorial practices that require longer texts to be broken down into sections, such as chapters. Our method also provides a data-driven denoising approach that works for texts of various lengths, in contrast to the more traditional approach of using large window sizes that may inadvertently smooth out relevant information, especially for shorter texts. These results open up avenues for future work in computational literary analysis, particularly the measurement of a basic unit of narrative.


Nvidia, Rescale team to enhance AI cloud automation and HPC-as-a-service

#artificialintelligence

Check out the on-demand sessions from the Low-Code/No-Code Summit to learn how to successfully innovate and achieve efficiency by upskilling and scaling citizen developers. Nvidia and Rescale today announced several enhancements designed to simplify artificial intelligence (AI) development and optimize high-performance computing (HPC) workflows. Nvidia is powering a new AI compute recommendation engine (CRE) to replace a more manually tuned approach. Both developments promise to make it easier to spin up new scientific workloads and operate them more efficiently. This will also apply equally to public cloud service and private cloud infrastructure.


Nvidia Adds Rescale Software Stack to AI Cloud Computing

#artificialintelligence

Nvidia does not have all the internal pieces to build out its massive AI computing empire, so it is enlisting software and hardware partners to scale its so-called AI factories in the cloud. The chipmaker's latest partnership is with Rescale, which provides the middleware to orchestrate high-performance computing workloads on public and hybrid clouds. Rescale is adopting Nvidia's AI technology for what the middleware provider called the world's first compute recommendation engine, which automates the selection of compute resources available in the cloud for high-performance applications and simulation, said Edward Hsu, chief product officer at Rescale. "We have over 1,000 applications and versions from leading simulation software providers, we work with all the major cloud providers and specialized architecture providers to really optimize the performance of computational science and engineering workloads," Hsu said. Rescale will also offer Nvidia's AI Enterprise software platform to customers, which includes pre-programmed AI models such as Modulus, which is used for scientific applications and simulation.


An Empirical Analysis of the Laplace and Neural Tangent Kernels

Lencevicius, Ronaldas Paulius

arXiv.org Artificial Intelligence

The neural tangent kernel is a kernel function defined over the parameter distribution of an infinite width neural network. Despite the impracticality of this limit, the neural tangent kernel has allowed for a more direct study of neural networks and a gaze through the veil of their black box. More recently, it has been shown theoretically that the Laplace kernel and neural tangent kernel share the same reproducing kernel Hilbert space in the space of $\mathbb{S}^{d-1}$ alluding to their equivalence. In this work, we analyze the practical equivalence of the two kernels. We first do so by matching the kernels exactly and then by matching posteriors of a Gaussian process. Moreover, we analyze the kernels in $\mathbb{R}^d$ and experiment with them in the task of regression.


GitHub - esimov/caire: Content aware image resize library

#artificialintelligence

Caire is a content aware image resize library based on Seam Carving for Content-Aware Image Resizing paper. First, install Go, set your GOPATH, and make sure $GOPATH/bin is on your PATH. The library can also be installed via Homebrew. The library is capable of detecting human faces prior resizing the images by using the lightweight Pigo (https://github.com/esimov/pigo) The image below illustrates the application capabilities for human face detection prior resizing.


Rescale Closes $105 Million in Expanded Series C Funding

#artificialintelligence

SAN FRANCISCO, Nov. 23, 2021 (GLOBE NEWSWIRE) -- Rescale, the leading hybrid cloud high performance computing (HPC) platform enabling intelligent computing for digital R&D, today announced it has closed $105 million in an expanded Series C funding round. Existing and new investors in the company include Sam Altman, Jeff Bezos, Richard Branson, Paul Graham, Peter Thiel, Fort Ross Ventures, Gaingels, Gopher, Hitachi Ventures, Initialized Capital, Keen Venture Partners, Microsoft M12, Nautilus Venture Partners, NVIDIA, Prometheus Capital, Republic Labs, Samsung Catalyst Fund, Solasta Ventures, Yield Capital Partners and more. The valuation was not disclosed. Rescale's announcement today follows a dramatic acceleration in customer demand, investor interest and market momentum, bringing the company's total funding to date to over $155 million. With over 200 enterprise customers, and year-over-year sales growing over 2x in 2021, Rescale is accelerating the digital transformation of the computational science and engineering discipline, which has traditionally been on-premises in private data centers but is rapidly shifting to cloud.


CDF Transform-and-Shift: An effective way to deal with datasets of inhomogeneous cluster densities

Zhu, Ye, Ting, Kai Ming, Carman, Mark, Angelova, Maia

arXiv.org Artificial Intelligence

The problem of inhomogeneous cluster densities has been a long-standing issue for distance-based and density-based algorithms in clustering and anomaly detection. These algorithms implicitly assume that all clusters have approximately the same density. As a result, they often exhibit a bias towards dense clusters in the presence of sparse clusters. Many remedies have been suggested; yet, we show that they are partial solutions which do not address the issue satisfactorily. To match the implicit assumption, we propose to transform a given dataset such that the transformed clusters have approximately the same density while all regions of locally low density become globally low density -- homogenising cluster density while preserving the cluster structure of the dataset. We show that this can be achieved by using a new multi-dimensional Cumulative Distribution Function in a transform-and-shift method. The method can be applied to every dataset, before the dataset is used in many existing algorithms to match their implicit assumption without algorithmic modification. We show that the proposed method performs better than existing remedies.


Image Classification with TensorFlow

#artificialintelligence

This article is an end-to-end example of training, testing and saving a machine learning model for image classification using the TensorFlow python package. TensorFlow is a machine learning (primarily deep learning) package developed and open-sourced by Google; when it was originally released TensorFlow was a relatively low-level package for experienced users, however in the last few years and especially since the release of TensorFlow 2.0 it is now aimed at a wider range of users. A few years ago I ran a PoC with one of our developers that looked at running TensorFlow models offline on one of our mobile applications. Whilst we found that it was possible we also encountered a few challenges that made the solution quite fiddly. Roll forward to 2020 and TensorFlow has improved a lot; the latest version has greater integration with the Keras APIs, it's being extended to cover more of the data processing pipeline and has also branched out to support for new languages, with the TensorFlow.js