Goto

Collaborating Authors

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models Pan Lu1, Michel Galley 2

Neural Information Processing Systems

Large language models (LLMs) have achieved remarkable progress in solving various natural language processing tasks due to emergent reasoning abilities. However, LLMs have inherent limitations as they are incapable of accessing up-to-date information (stored on the Web or in task-specific knowledge bases), using external tools, and performing precise mathematical and logical reasoning.



One Fits All: Power General Time Series Analysis by Pretrained LM Tian Zhou Xue Wang

Neural Information Processing Systems

Although we have witnessed great success of pre-trained models in natural language processing (NLP) and computer vision (CV), limited progress has been made for general time series analysis. Unlike NLP and CV where a unified model can be used to perform different tasks, specially designed approach still dominates in each time series analysis task such as classification, anomaly detection, forecasting, and few-shot learning. The main challenge that blocks the development of pre-trained model for time series analysis is the lack of a large amount of data for training. In this work, we address this challenge by leveraging language or CV models, pre-trained from billions of tokens, for time series analysis. Specifically, we refrain from altering the self-attention and feedforward layers of the residual blocks in the pre-trained language or image model. This model, known as the Frozen Pretrained Transformer (FPT), is evaluated through fine-tuning on all major types of tasks involving time series. Our results demonstrate that pre-trained models on natural language or images can lead to a comparable or state-of-the-art performance in all main time series analysis tasks, as illustrated in Figure 1. We also found both theoretically and empirically that the self-attention module behaviors similarly to principle component analysis (PCA), an observation that helps explains how transformer bridges the domain gap and a crucial step towards understanding the universality of a pre-trained transformer.



Structured Federated Learning through Clustered Additive Modeling

Neural Information Processing Systems

Heterogeneous federated learning without assuming any structure is challenging due to the conflicts among non-identical data distributions of clients. In practice, clients often comprise near-homogeneous clusters so training a server-side model per cluster mitigates the conflicts. However, FL with client clustering often suffers from "clustering collapse", i.e., one cluster's model excels on increasing clients, and reduces to single-model FL. Moreover, cluster-wise models hinder knowledge sharing between clusters and each model depends on fewer clients. Furthermore, the static clustering assumption on data may not hold for dynamically changing models, which are sensitive to cluster imbalance/initialization or outliers. To address these challenges, we propose "Clustered Additive Modeling (CAM)", which applies a globally shared model Θ


Supplementary Materials for the Paper " L2T-DLN: Learning to Teach with Dynamic Loss Network "

Neural Information Processing Systems

BIT Special Zone Beijing Institute Of Technology Beijing, China, 100081 {haizhaoyang, liyuan.pan, In this supplementary material, we provide the proofs of convergence analysis in Section 1, 1-vs-1 transformation employed in the classification and semantic segmentation tasks in Section 2, the coordinate-wise and the preprocessing method of the LSTM teacher in Section 3, the loss functions of YOLO-v3 in Section 4, more experiments of image classification in Section 5, and the inferences of semantic segmentation in Section 6. A differentiable function e() is L-smooth with gradient Lipschitz constant C (uniformly Lipschitz continuous), if e(x) e(y) C x y, x, y. If e(x) ϵ, then x is an ϵ-first-order stationary point. For a differentiable function e(), if x is a SS1, and there exists ϵ > 0 so that for any y in the ϵ-neighborhood of x, we have e(x) e(y), then x is a local minimum. A saddle point x is an SS1 that is not a local minimum.



UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild

Neural Information Processing Systems

Achieving machine autonomy and human control often represent divergent objectives in the design of interactive AI systems. Visual generative foundation models such as Stable Diffusion show promise in navigating these goals, especially when prompted with arbitrary languages. However, they often fall short in generating images with spatial, structural, or geometric controls. The integration of such controls, which can accommodate various visual conditions in a single unified model, remains an unaddressed challenge. In response, we introduce UniControl, a new generative foundation model that consolidates a wide array of controllable condition-to-image (C2I) tasks within a singular framework, while still allowing for arbitrary language prompts. UniControl enables pixel-level-precise image generation, where visual conditions primarily influence the generated structures and language prompts guide the style and context. To equip UniControl with the capacity to handle diverse visual conditions, we augment pretrained text-to-image diffusion models and introduce a task-aware HyperNet to modulate the diffusion models, enabling the adaptation to different C2I tasks simultaneously. Trained on nine unique C2I tasks, UniControl demonstrates impressive zero-shot generation abilities with unseen visual conditions. Experimental results show that UniControl often surpasses the performance of single-task-controlled methods of comparable model sizes.



Appendix A Illustrative Examples infinitesimal generator of the 2-dimensional rotation group, SO(2): v

Neural Information Processing Systems

By the prolongation formula, Eq. (4), the first prolongation in t is given by: ϕ As a final illustrative example of the symmetry criterion, we will follow Olver's example below: This choice was based on the previous research using PINN and DeepONets for solving Burgers' The output of the embedding vectors from both networks is 100 dimensional. For both the Heat equation and Burgers' equation experiments, we perform hyper-parameter tuning We also note that for Burgers' equation, we found that cosine similarity for L The results reported in Section 4 use cosine-similarity. We will make the data and the code available on GitHub. The corresponding mean-squared errors are reported in Table 2.