Goto

Collaborating Authors

 flow




Wavelet Flow: Fast Training of High Resolution Normalizing Flows

Neural Information Processing Systems

Normalizing flows are a class of probabilistic generative models which allow for both fast density computation and efficient sampling and are effective at modelling complex distributions like images. A drawback among current methods is their significant training cost, sometimes requiring months of GPU training time to achieve state-of-the-art results. This paper introduces Wavelet Flow, a multi-scale, normalizing flow architecture based on wavelets. A Wavelet Flow has an explicit representation of signal scale that inherently includes models of lower resolution signals and conditional generation of higher resolution signals, i.e., super resolution. A major advantage of Wavelet Flow is the ability to construct generative models for high resolution data (e.g., 1024 1024 images) that are impractical with previous models.


This Oscar Season's Great Underdog Story Is the Story of a Cat

Slate

Just as there are cat people and dog people, there are cat filmmakers and dog filmmakers. Sean Baker, who brought his pup Bunsen to Cannes along with his Palme d'Or–winning Anora, is a dog filmmaker. That's not to say one can't appreciate both, whether we're talking movies or pets. I myself am a cat person who currently has two dogs. But I think that on some level you are either drawn primarily to the sly, withholding spirit of cat movies or the energetic emotionality of dog movies, and nothing can alter that fundamental orientation.


Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative

Li, Zihao, Lin, Xiao, Liu, Zhining, Zou, Jiaru, Wu, Ziwei, Zheng, Lecheng, Fu, Dongqi, Zhu, Yada, Hamann, Hendrik, Tong, Hanghang, He, Jingrui

arXiv.org Artificial Intelligence

While many advances in time series models focus exclusively on numerical data, research on multimodal time series, particularly those involving contextual textual information commonly encountered in real-world scenarios, remains in its infancy. Consequently, effectively integrating the text modality remains challenging. In this work, we highlight an intuitive yet significant observation that has been overlooked by existing works: time-series-paired texts exhibit periodic properties that closely mirror those of the original time series. Building on this insight, we propose a novel framework, Texts as Time Series (TaTS), which considers the time-series-paired texts to be auxiliary variables of the time series. TaTS can be plugged into any existing numerical-only time series models and enable them to handle time series data with paired texts effectively. Through extensive experiments on both multimodal time series forecasting and imputation tasks across benchmark datasets with various existing time series models, we demonstrate that TaTS can enhance predictive performance and achieve outperformance without modifying model architectures.


Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting

Sanyal, Sunny, Prairie, Hayden, Das, Rudrajit, Kavis, Ali, Sanghavi, Sujay

arXiv.org Machine Learning

Fine-tuning a pre-trained model on a downstream task often degrades its original capabilities, a phenomenon known as "catastrophic forgetting". This is especially an issue when one does not have access to the data and recipe used to develop the pre-trained model. Under this constraint, most existing methods for mitigating forgetting are inapplicable. To address this challenge, we propose a sample weighting scheme for the fine-tuning data solely based on the pre-trained model's losses. Specifically, we upweight the easy samples on which the pre-trained model's loss is low and vice versa to limit the drift from the pre-trained model. Our approach is orthogonal and yet complementary to existing methods; while such methods mostly operate on parameter or gradient space, we concentrate on the sample space. We theoretically analyze the impact of fine-tuning with our method in a linear setting, showing that it stalls learning in a certain subspace which inhibits overfitting to the target task. We empirically demonstrate the efficacy of our method on both language and vision tasks. As an example, when fine-tuning Gemma 2 2B on MetaMathQA, our method results in only a $0.8\%$ drop in accuracy on GSM8K (another math dataset) compared to standard fine-tuning, while preserving $5.4\%$ more accuracy on the pre-training datasets. Our code is publicly available at https://github.com/sanyalsunny111/FLOW_finetuning .


AgentInstruct: Toward Generative Teaching with Agentic Flows

Mitra, Arindam, Del Corro, Luciano, Zheng, Guoqing, Mahajan, Shweti, Rouhana, Dany, Codas, Andres, Lu, Yadong, Chen, Wei-ge, Vrousgos, Olga, Rosset, Corby, Silva, Fillipe, Khanpour, Hamed, Lara, Yash, Awadallah, Ahmed

arXiv.org Artificial Intelligence

Synthetic data is becoming increasingly important for accelerating the development of language models, both large and small. Despite several successful use cases, researchers also raised concerns around model collapse and drawbacks of imitating other models. This discrepancy can be attributed to the fact that synthetic data varies in quality and diversity. Effective use of synthetic data usually requires significant human effort in curating the data. We focus on using synthetic data for post-training, specifically creating data by powerful models to teach a new skill or behavior to another model, we refer to this setting as Generative Teaching. We introduce AgentInstruct, an extensible agentic framework for automatically creating large amounts of diverse and high-quality synthetic data. AgentInstruct can create both the prompts and responses, using only raw data sources like text documents and code files as seeds. We demonstrate the utility of AgentInstruct by creating a post training dataset of 25M pairs to teach language models different skills, such as text editing, creative writing, tool usage, coding, reading comprehension, etc. The dataset can be used for instruction tuning of any base model. We post-train Mistral-7b with the data. When comparing the resulting model Orca-3 to Mistral-7b-Instruct (which uses the same base model), we observe significant improvements across many benchmarks. For example, 40% improvement on AGIEval, 19% improvement on MMLU, 54% improvement on GSM8K, 38% improvement on BBH and 45% improvement on AlpacaEval. Additionally, it consistently outperforms other models such as LLAMA-8B-instruct and GPT-3.5-turbo.


Flow: Per-Instance Personalized Federated Learning Through Dynamic Routing

Panchal, Kunjal, Choudhary, Sunav, Guan, Hui

arXiv.org Artificial Intelligence

Personalization in Federated Learning (FL) aims to modify a collaboratively trained global model according to each client. Current approaches to personalization in FL are at a coarse granularity, i.e. all the input instances of a client use the same personalized model. This ignores the fact that some instances are more accurately handled by the global model due to better generalizability. To address this challenge, this work proposes Flow, a fine-grained stateless personalized FL approach. Flow creates dynamic personalized models by learning a routing mechanism that determines whether an input instance prefers the local parameters or its global counterpart. Thus, Flow introduces per-instance routing in addition to leveraging per-client personalization to improve accuracies at each client. Further, Flow is stateless which makes it unnecessary for a client to retain its personalized state across FL rounds. This makes Flow practical for large-scale FL settings and friendly to newly joined clients. Evaluations on Stackoverflow, Reddit, and EMNIST datasets demonstrate the superiority in prediction accuracy of Flow over state-of-the-art non-personalized and only per-client personalized approaches to FL.


Flows for Flows: Training Normalizing Flows Between Arbitrary Distributions with Maximum Likelihood Estimation

Klein, Samuel, Raine, John Andrew, Golling, Tobias

arXiv.org Artificial Intelligence

Normalizing flows are constructed from a base distribution with a known density and a diffeomorphism with a tractable Jacobian. The base density of a normalizing flow can be parameterised by a different normalizing flow, thus allowing maps to be found between arbitrary distributions. We demonstrate and explore the utility of this approach and show it is particularly interesting in the case of conditional normalizing flows and for introducing optimal transport constraints on maps that are constructed using normalizing flows.


Pseudo-OOD training for robust language models

Sundararaman, Dhanasekar, Mehta, Nikhil, Carin, Lawrence

arXiv.org Artificial Intelligence

Motivated by the above limitations, we propose a framework called POsthoc pseudo Ood REgularization Detecting Out-of-Distribution (OOD) (Goodfellow (POORE) that generates pseudo-OOD data et al., 2014; Hendrycks and Gimpel, 2016; using the trained classifier and the In-Distribution Yang et al., 2021) samples is vital for developing (IND) samples. As opposed to methods that use reliable machine learning systems for various outlier exposure, our framework doesn't rely on any industry-scale applications of natural language understanding external OOD set. Moreover, POORE can be easily (NLP) (Shen et al., 2019; Sundararaman applied to already deployed large-scale models et al., 2020) including intent understanding trained on a classification task, without requiring in conversational dialogues (Zheng et al., 2020; to re-train the classifier from scratch. In summary, Li et al., 2017), language translation (Denkowski we make the following contributions: and Lavie, 2011; Sundararaman et al., 2019), and text classification (Aggarwal and Zhai, 2012; Sundararaman 1. We propose a Mahalanobis-based context et al., 2022). For instance, a language masking scheme for generating pseudo-OOD understanding model deployed to support a chat samples that can be used during the finetuning.